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ABSTRACT 

We investigate halo mass selection properties of red-sequence cluster finders using 
galaxy populations of the Millennium Simulation (MS). A clear red sequence exists 
for MS galaxies in massive halos at redshifts z < 1, and we use this knowledge to in- 
form a cluster-finding algorithm applied to 500 ft. -1 Mpc projections of the simulated 
volume. At low redshift (z = 0.4), we find that 90% of the clusters found have galaxy 
membership dominated by a single, real-space halo, and that 10% are blended systems 
for which no single halo contributes a majority of a cluster's membership. At z = 1, 
the fraction of blends increases to 22%, as weaker redshift evolution in observed color 
extends the comoving length probed by a fixed range of color. Other factors contribut- 
ing to the increased blending at high-z include broadening of the red sequence and 
confusion from a larger number of intermediate mass halos hosting bright red galaxies 
of magnitude similar to those in higher mass halos. Our method produces catalogs of 
cluster candidates whose halo mass selection function, p(M\N sa \, z), is characterized 
by a bimodal log-normal model with a dominant component that reproduces well the 
real-space distribution, and a redshift-dependent tail that is broader and displaced by 
a factor ~ 2 lower in mass. We discuss implications for X-ray properties of optically 
selected clusters and offer ideas for improving both mock catalogs and cluster-finding 
in future surveys. 

Key words: cosmology: clusters of galaxies, large scale structure 



1 INTRODUCTION 

The abundance and distribution of massive dark matter ha- 
los provide a sensitive probe of cosmology and theories of 
structure formation. The galaxies within these halos also 
have their evolution strongly affected by their hosts. Clus- 
ters of galaxies are the observational realization of such halos 
which has inspired multi-wavelength campaigns to find and 
characterize them. With the advent of large format CCD 
cameras on large telescopes, which can identify galaxies to 
high redshifts over wide fields, there has been renewed inter- 
est in optical searches for clusters using multicolor imaging 
Kaiser et all 1 19981; iLubin et alll2000l; IGladders fc Yedl200oI 



200E ; IGladders et all" 20061 : IMiller et all 120051 ; iKoester eta] 
20071 ). see iGall (| 2006 ) for a review of optical cluster find- 



ing methods. In p articular, methods which identify the clus- 
ter red seq uence dBower, Lucev. fc Ellis! 19921; Lopez-Cruz 



19971; IGladders fc Yee!l2000l; lLopez-Cruz. Barkhouse fc Yee 



20041; iGal. Lubin fc Squires! |2005|: bladders fc Yed 120051 : 



Gladders et all 120061 ; IWilson et all l2006t ) have attained sig- 



nificant success in identifying cluster candidates over wide 
fields to z ~ 1 and above. 

Because red sequence galaxies dominate the cluster pop- 
ulation, including the reddest galaxies at a given redshift and 
becoming redder with increasing redshift, the restriction to 
red sequence colors approximately isolates a redshift slice. 
This redshift filtering increases the signal-to-noise of clus- 
ter detection by largely eliminating projection effects from 
unassociated structures along the line of sight. However, con- 
tamination is still expected from blue galaxies at even higher 
redshift than the cluster and from galaxies near enough to 
the cluster to lie within the narrow, red-sequence color re- 
gion. This residual contamination is the focus of this work. 

We are motivated by cu rrent red sequence based cluster 
searc hes, such as the SDSS (|Koester et a]||2007l ; IMiller et all 
2005), in particular by those using two filters only such as the 
RCS and the RCS-2 (|Gladders fc Yeell2000l ; IGladders eta] 
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120061 ) and SpaRCS l|Wilson et all 120061 ) B We investigate 
the nature of the cluster population selected by a two filter 
method applied to mock galaxy samples of the Millennium 
Simulation (MS) JSpringel et all 120051; ICroton et all 120061 ; 
iLemson et all 120061 ; IKitzbichler and White! 120071 ). Through- 
out this paper, we use "clusters" to refer to objects found by 
the algorithm and "halos" to refer to the dark matter halos 
identified in the simulation using the full 3D dark matter 
distribution. We use joint halo-cluster membership — iden- 
tifying the MS halos to which each cluster's galaxies belong 
- to categorize the purity and completeness of the clus- 
ter population. (Joint halo-cluster membership is defined by 
taking a cluster, found using the red sequence method below, 
and then identifying the MS halos to which its galaxies be- 
long.) Our cluster finder is patterned after the scheme used 
in three dimensions to identify halos. We apply a circular 
overdensity algorithm, centered on bright z-band galaxies, 
to spatial projections of the galaxy populations at the dis- 
crete redshifts z = 0.4f, 0.69 and 0.99. 

An advantage of the Millennium Simulation is that it 
provides mock clusters situated in their correct cosmologi- 
cal context as part of the evolving cosmic web. Including the 
cosmic web is significant because projections of superclus- 
ters, structures that tend to align along filaments meeting 
at the cluster of interest, provide a major source of con- 
fusion for cluster identification that is difficult to otherwise 
model. By having available the full 3D galaxy and dark mat- 
ter distribution in the simulation we are able to monitor and 
isolate different physical effects which can influence red se- 
quence cluster finding. 

The outline of the paper is as follows. We describe our 
methods in Sj2] and give our findings for the MS in [J3] We 
consider some implications and properties of the blends in 
Sj4] and discuss properties causing and correlating with the 
blending which might extend beyond our particular search 
algorithm and simulation in SjS] We conclude in SjB] The ap- 
pendix compares different purity and completeness defini- 
tions in use. 



2 METHODS 

The context for our study is the model of the spatial distri- 
bution of massive halos and the galaxie s that inhabit them 
provided by the M illennium simulation l|Springel et alll2005l ; 
ILemson et alll2006T ) . This is a collisionless dark matter simu- 
lation performed in a periodic cube 500 ft _1 Mpc (comoving) 
on a side, using 10 10 particles for a cosmology with parame- 
ters (Q m , fi A , cr 8 , fi 6 , h, n) = (0.25, 0.75, 0.9, 0.045, 0.73, 1.0). 
Mock galaxies, with luminosities and colors, are gener- 
ated by post-processing the dark matter halo merger 
trees with a semi-analytic prescripti on for the g as dy - 
namics and feedback. For de tails, see ICroton et all (2006); 
IKitzbichler and White! (|2007t) . In particul a r, our version is 
that described in detail in ICroton et all (|2006 | ). however 
with t he updated dust prescription of Kitzbichler and White! 
|2007h which better models dust extinction at higher red- 
shifts. 



For up to date information about the RCS and SpaRCS 
surveys see http://www. astro. utoronto.ca/~gladders/RCS/ and 
http://spider.ipac.caltech.edu/staff/gillian/SpARCS 
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Figure 1. Distributions of r— z colors and magnitudes at z = 0.41 
(left) and 0.99 (right) for all z-band magnitude-limited galaxies 
(top) and for those galaxies in halos with at least eight members 
(bottom). Contours are in steps of ~ 770 (left, top), ~ 260 (left, 
bottom), ~ 360 (right, top) and ~ 60 (right, bottom) galaxies. 
Straight lines show the color— magnitude region defining the red 
sequence at each redshift. 



We focus our cluster finding investigation on local con- 
fusion, projections on spatial scales <250/i _1 Mpc of a tar- 
get halo that will, at these redshifts, be barely resolved by 
photometric redshifts of the next-generation surveys (DESq 
CFHT-LS0, Pan-Starrsfl, KID^J, SNAlfl LSSjj]]). (Al- 
though the scales these surveys might resolve are comparable 
to the box size considered here, these surveys are not neces- 
sarily using only the RS method described in this paper.) We 
use the simulated galaxy and halo catalogues at three fixed 
epochs given by redshifts z = 0.41, 0.69 and 0.99. These val- 
ues span much of the expected redshift range of interest for 
a survey such as the RCS. Halos in the si mulation are found 
by using a friends-of-friends algorithm l|Davis et alj Il985l ) 
and galaxy membership is determined based on this. The 
friends-of-friends linking length (0.2 times the mean inter- 
particle spacing) can link objects into one large halo which 
by eye look to be smaller components, we note below where 
our results show signs of this effect. Halo masses are given 
in terms of M200C (denoted as M henceforth), the mass en- 
closed within a radius interior to which the mean density is 
200 times the critical density at that epoch. At our redshifts 
there were 1268, 805 and 426 halos with M > I0 14 h _1 M 
and 113, 47 and 19 halos with M > 3 X 10 14 h~ 1 M Q . 

For the red sequence search, the SDSS r and z filters, 
which bracket the 4000 Angstrom break for approximately 
0-5 < z < 1, are employed. At the highest redshift, we also 



http:/ /www. darkenergysurvey.org 

3 http:/ /cadcwww.hia.nrc.ca/cfht/cfhtls/ 

4 http:/ /pan-starrs. ifa.hawaii.edu 

5 ttp:/ /www. astro-wise.org/projects/KIDS/ 

6 http://snap.lbl.gov 

7 http://www.lsst.org 
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Table 1. Changes in redshifts, colors and cuts for three boxes used. 



Redshift 




^max 


intercept 


slope 


max RS dist. Aj_ 


mhch) low / hl s h 


d (ft-iG p B c) W hl § h 


0.41 


0.31 


0.51 


0.52 


0.028 


0.078 


-0.72/ 1.30 


-0.56/0.52 


0.69 


0.57 


0.81 


0.72 


0.052 


0.14 


-1.10/0.36 


-0.80/0.64 


0.99 


0.85 


1.14 


0.75 


0.060 


0.18 


-0.40/0.72 


-1.08/1.40 



Change in redshift across the Millennium box at different redshifts, red sequence intercept and 
slope, maximum distance from red sequence in color- magnitude space (Ax), the r — z color change 
across the box (to front, and then to back, per h~ 1 Gpc), and the z magnitude change across the 
box. Color and magnitude changes are taken from the Bruzual-Charlot (2003) model as described 
in the text, see also Fig. 2a. 
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Figure 2. a) Solid lines show the r — z color evolution of a z = 3 burst population synthesis model of [Bruzual Sz Chariot] {2003). 
Circles show the mean colors of MS red sequence galaxies at the three redshifts we investigate, while straight line segments give the color 
gradients applied when projecting the galaxy population along a line-of-sight (Table 1). Vertical portions of the dashed boxes at each 
epoch mark the foreground and background redshifts of the ±250 /i — 1 Mpc volume, while the horizontal lines mark the approximate 
width of the red sequence, b) The relative fraction of galaxies remaining on the red sequence as a function of projected distance (heavy 
lines). Solid, dotted, and dashed lines correspond to z = 0.41, 0.69 and 0.99, respectively. Thin lines give the relative number of galaxies 
that move onto the red sequence as their observed color and magnitude vary due to their line-of-sight displacement. All counts are 
normalized by the number of red sequence galaxies within the volume at each redshift. 



considered i band, our results for this case are described at 
the end of §4; results below will be given for r — z unless 
stated otherwise. 



2.1 Galaxy Colors in Massive Halos 

Fig. [T] shows that a red sequence in r — z vs. z exists in rich 
MS halos over the range of redshifts probed. We use galaxies 
above ~ \L*, corresponding to z- magnitudes of 19.6, 21 and 
22 at redshifts 0.41, 0.69 and 0.99, and yielding samples of 
942313, 1005469 and 1054711 galaxies, respectively. The top 
panels show contours of the full, magnitude-limited popula- 
tion while lower panels show the color-magnitude behavior 
of galaxies in halos with 8 or more members. 



Taking galaxies within the inner 0.5/i _1 Mpc of the cen- 
ters of the latter samp le, we fit a linear relat ion in the r — z 
vs. z plane. Following Glad ders et all |l998) , we throw out 
3cr outliers and iterate the fit to find the slope and intercept 
of the red sequence. The width of the red sequence is set 
to enclose 90% of the full massive halo galaxy sample. The 
distance, Ax, is taken perpendicular to the red sequence 
line in the color-magnitude space. Table[T]lists the slopes, 
intercepts, and widths of the red sequence for all three red- 
shifts. The red sequence color-magnitude relation is a weak 

8 If one instead uses > 90% of these galaxies, the red sequence 
widens and for high redshift slightly increases the contamination 
from projection under study here. 
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function of halo mass or richness, so the parameters are not 
particularly sensitive to the choice of halos with 8 or more 
members. 

Defining the red sequence using the MS galaxy popu- 
lation itself means that our color cuts are optimally tuned 
to the content of the MS massive halos. With observations, 
one derives color cuts using the color-magnitude data of a 
target subset o f gala xies, such as the approa ches used by 
Idadders eta] (|l998T ) and iKoester et all (|200^ . Comparing 
the simulation results to observations, it appears that the 
mock red sequence has the wrong tilt and is slightly wider 
than observed. We experimented with "tightening" the red 
sequence by moving the galaxy colors closer to the best-fit 
line, but such a procedure did not have a large effect on our 
conclusions so we present our results using colors as pro- 
vided. 

We wish to use projections of each proper time out- 
put to create finite redshift segments of a full sky survey. 
Starting with the coeval MS galaxy samples, we introduce 
passive color and magnitude evolution into spatial projec- 
tions to mimic the behavior of a light-cone population. The 
color evolution with redshift is based on an instantaneous 
Bruzual-Charlot (BC) burst at z=3 and shown for r — z 
in Fig. [2^,0 For comparison, we show the average (slightly 
bluer) color of the MS red sequence galaxies for our three 
redshifts. The MS red sequence galaxies are expected to be 
bluer than the BC model, since their stars were not formed 
in a single burst at high redshif t. The MS galaxies are also 
bluer than BCG's in the SDSS (|Bernardi et alll2007l ). 

We use this simple BC model to define piecewise con- 
stant color gradients, d(r — z) /d redshift, along the line of 
sight, shown by the solid line segments in Fig(2^,. We de- 
fine a z magnitude gradient analogously. Foreground and 
background color- magnitude evolution are modeled sepa- 
rately, with parameters given in Table [1] Fainter galaxies 
may evolve into the z magnitude cut because of the change 
in observed magnitude with redshift. To catch these poten- 
tial interlopers, we employ galaxy catalogues half a magni- 
tude fainter in z-band than required by the unevolved red 
sequence cuts. 

Note that the applied color gradient becomes progres- 
sively shallower at higher redshift. The assumed degree of 
color and magnitude evolution is key since it controls the 
redshift filtering power of the red sequence. To foreshadow 
one of our main results, Fig. [2^, illustrates how the color 
evolution determines the line-of-sight path length probed 
by the red sequence color range. The dashed regions in 
Fig. [2^ are centered at the average color of the red sequence 
galaxies at each redshift and are bounded vertically by the 
approximate range of color of the red sequence. They are 
bounded horizontally by the redshift extents of the comoving 
±250 /i -1 Mpc sightline available within the MS volume. At 
z = 0.41, the evolutionary color gradients are strong enough 
that projected red sequence galaxies will shift out of the 
target color range before the ±250 h~ l Mpc MS boundary 
is reached, but this is not quite the case at z = 0.69 and 
0.99. 



9 We thank N. Padmanabha n and B. Koester for th e evo- 
lution of galaxy co l ors u sing iBruzual fc Chariot! d2003h as m 
iPadmanabhan et all (|2006|V 



Fig|2]3 further illustrates how the imposed color evolu- 
tion acts as a redshift filter. Taking the color and magnitude 
of each galaxy and our line of sight gradients, FigOs shows 
the fraction of these galaxies remaining on the red sequence 
as a function of line-of-sight distance. Such galaxies will still 
be potential members of a cluster centered at the origin. A 
more narrowly peaked distribution indicates a smaller frac- 
tion of galaxies available for inclusion via projection during 
cluster finding. As can be seen, the fraction of galaxies re- 
maining within the red sequence cut at large distances from 
the origin increases with redshift; the red sequence selects a 
longer path along line of sight at higher redshift. 

The other source of contamination is galaxies that are 
shifted into the red sequence by the change in observed color. 
The number density of these galaxies, normalized by the 
number of red sequence galaxies at the central redshift, is 
shown by the light lines in Fig[2jD. Except for the most dis- 
tant part of the box at z = 0.41, this number is relatively 
small. Our use of a uniform color change with redshift for 
all galaxies is not strictly correct for all galaxy types. How- 
ever, blue star forming galaxies change in observed color 
much more slowly with redshift than in this model, so to 
be shifted erroneously into our red sequence color cut, these 
galaxies are required to be at significantly higher redshift 
than the cluster. Since they would then lie outside of our 
500 Mpc box, they are not included in our analysis. 
The strongest contribution to interloper candidates is from 
galaxies which have colors within our red sequence color cut 
even though they are far from the central galaxy along the 
line of sight. 

2.2 Cluster Finding Algorithm 

Our algorithm defines clusters as circular regions, centered 
on a bright galaxy, with red-sequence sky surface density 
equal to a multiple A p of the mean value at the redshift of 
interest. This approach is analogous to the spherical over- 
density method used to define the halo masses. For target 
centers, we work in descending order through a list of red- 
sequence galaxies ranked (brightest to dimmest) by apparent 
z-band magnitude. This ranking is motivated by a desire to 
find the rare, high mass halos first, then work down the mass 
function to more common objects. 

Around a potential cluster center, a radially-sorted list 
of red sequence neighbors is used to define a mean galaxy 
number density profile as a function of transverse separation. 
We use the periodic boundaries of the MS to recenter the 
simulated volume on each candidate center. The volume ex- 
tends 250 ft _1 Mpc in front and behind, and galaxy colors are 
adjusted, linearly with distance in the projected direction, 
as described above. Starting with the 8 nearest neighbors, 
(to avoid shot noise problems in tracing the cluster profiles 
at small radii), we work outward in radius r ga i until the 
the number of galaxies JV ga i fails to satisfy the overdensity 
criterion 



Here n is the mean sky surface density of red sequence galax- 
ies in the MS, including the effects of the applied observed 
color evolution along the projected dimension. If the over- 
density criterion is not satisfied for 8 galaxies, the object is 
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discarded, if 7V ga i meets or exceeds a minimum of 8 galaxies, 
then this cluster is added to the output list. All members 
are then recorded and removed from the remaining list of 
potential cluster centers W\ 

Note that area overlap of clusters is allowed, so that a 
single galaxy can belong to more than one cluster (6-7% of 
the galaxies end up in more than one cluster at the lowest 2 
redshifts, dropping to 4 — 5% at higher redshifts; in contrast, 
galaxies only belong to one MS halo). To boost statistics, 
we make three projections of the simulated volume along its 
principal axes. 

The choice of A p is discussed below. The sensitivity of 
survey purity and completeness to the choice of A p is further 
explored in the appendix. 

2.3 Cluster-Halo Matching 

The clusters found by the search algorithm can be linked 
back to the dark matter halos in the simulation using their 
respective lists of galaxy members. A perfect algorithm 
would be complete with respect to halos and have no false 
positives, i.e., no clusters that appear rich on the sky but 
are actually several less rich systems aligned along the line- 
of-sight. In addition, the halo and cluster richnesses would 
agree. A perfect algorithm would therefore recover the in- 
trinsic distribution of halo mass M as a function of red se- 
quence galaxy richness iV ga i. This distribution is shown in 
the top panels of Fig. [3] 

At all redshifts, a mean, red sequence richness of 
iVg a i ~ 20 above our z-magnitude limit corresponds to a 
~ 10 14 /i _1 Mq halo. When fit (throwing out 3-<r outliers 
several times) above a minimum of 8 members, we find that 
mass scales with red sequence richness as M=M2o(N ga ,\ — l) a , 
with a = 1.07, 1.10 and 1.10 at 2=0.41, 0.69 and 0.99 
respectively. The mass intercepts are M2a=1.3, 1.3 and 
1.5 x 10 14 /i _1 M Q and there are ~ 4100, 2900, and 1300 
A'gai > 8 halos at these redshifts, respectively. Note that 
red sequence richness is a fairly noisy tracer of mass; the 
rms level of scatter is ~ 50% or higher above the rich- 
ness cut of -/V ga i=8 (a detailed discussion of scatter i n rich - 
ness vs. mass can be found in | White fc Kochanekl l|2002l) ; 
iDai. Kochanek fc Morgan! l|2007h V The richness we use in 
finding the clusters may not be the best richness to use for 
getting the cluster mass (e.g. galaxy counts within some 
aperture might be useful after the clusters are found, for 
finding the clusters themselves we found a fixed aperture 
performed significantly worse). Some observational surveys 
for galaxy overdensities account for projections of fore- 
ground/background galaxies via a statistical subtraction of 
the expected number of projected galaxies, calculated from 

10 Roughly the cluster will have a density of red sequence galax- 
ies A p times the average red sequence (background) density, 
~ 0.7/ ( h~ 1 Mpc) 2 in our case. The approximate change of radius 
with richness can be read off from Eq[T] Note too that our cluster- 
finding algorithm traces galaxy overdensities to radii which can 
potentially reach greater than 1 h -1 Mpc. This algorithm in- 
creases the survey sensitivity to truly extended structures, but 
may also increase the cross-section for interlopers relative to al- 
gorithms which search for clusters only on a limited, smaller scale; 
however, a fixed aperture richness based cluster finder performed 
significantly more poorly. 



random non-cluster pointings. Our cluster richness estima- 
tor, ATgai, does not include such a correction; our overdensity 
requirement means that approximately 1 / A p of the galaxies 
are from the background. 

For each cluster identified in projection, we list all halos 
contributing one or more of its member galaxies. The qual- 
ity of the cluster detection is measured by the top-ranked 
matched fraction, fih , defined as the fraction of cluster mem- 
bers coming from the halo that contributes the plurality of 
the cluster's red sequence galaxies. We define two classes, 
clean and blended, based on whether the plurality is or is 
not the majority of the cluster's membership, 

clean : f lh > 0.5, (2) 
blended : f lh < 0.5, (3) 

We assign to each cluster the mass of its top-ranked halo 
found through member-matching. If two (or more) halos 
contribute the same number of galaxies, and are both top- 
ranked, we take the most massive. 



3 RESULTS 

An ideal cluster catalog would be pure, complete and unbi- 
ased with respect to halos. A perfectly pure sample would 
have no accidental projections; all the galaxies of any cho- 
sen cluster would be common members of a single, dark 
matter halo. A perfectly complete sample would be one for 
which each halo in the survey volume appears once, and 
only once, in the list of clusters. Finally, an unbiased clus- 
ter catalog would contain clusters that reproduce the mean 
mass-richness relation defined by halos. In this section, we 
consider these issues, both in the context of setting our circu- 
lar overdensity threshold and in the results obtained. We will 
see that high levels of purity and completeness are achieved, 
and that the cluster samples are nearly unbiased. (Many def- 
initions of purity and completeness exist in the literature, we 
describe and compare several of them in the appendix, and 
detail our definitions as we use them below.) 

3.1 Cluster finder threshold and two examples 

The cluster catalogs produced by the search algorithm de- 
pend on the value of the number density threshold A p . 
Choosing too high a value will pick out only the cores of 
the richest halos, resulting in a catalog that is pure and 
complete at very high masses, but is otherwise incomplete. 
Picking too low a value will extend the search into the pe- 
riphery of halos, leading to a catalog that, although complete 
across a wide range of masses, suffers from impurities due 
to blending multiple halos into a single cluster. 

Our choice of A p = 7 and iV b s > 8 for clusters provides 
samples that are highly complete for iVtrue > 20 halos. Fig. [4] 
shows a measure of completeness, the fraction of halos as- 
signed as top-ranged matches to clusters with JV gal > 8. The 
completeness is very high for halos with intrinsic N tIU e > 20, 
but it drops considerably for lower-richness halos. More ha- 
los are missed at higher redshift, and these tend to have 
extended, filamentary shapes suggestive of recent (or immi- 
nent) merging. At higher redshift, the major merger rate 
increases, leading to a higher fraction of disturbed halos. 

Keeping the cluster richness fixed at iV b s > 8 in order 
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Figure 3. Top: Relation between halo mass and intrinsic red sequence galaxy richness at z=OAl, 0.69, and 0.99 (left to right). Bottom: 
Relation between top-ranked halo mass and cluster red sequence galaxy richness at the same redshifts, taken along one projection axis. 
Crosses have fi^ < 0.5 and comprise (12%, 15%, 20%) of the iV ga i > 8 clusters. Solid (dashed) lines are least-squares fits for iV ga i > 8 
halos (clean clusters). 



to define whether a halo is found or not (completeness) , sam- 
ples derived with higher values of A p will be more pure (have 
fewer blends) but less complete, and vice-versa for samples 
constructed with lower A p . Further quantitative discussion 
on purity and completeness can be found in the appendix. 

FigO shows that, at each redshift, the value A p — 7 
produces a cluster catalog with a richness function, n(A r ga i), 
that matches well that of the underlying halo population. 
Averaging the three projections, there are 4432, 2919 and 
1321 clusters with 7V ga i > 8 at z = 0.41, 0.69 and 0.99, re- 
spectively. These values compare well to the MS halo counts 
of 4098, 2926, 1290 for iV ga i > 8. The scatter from the av- 
erage of cluster numbers between different lines of sight is 
less than a percent at z — 0.41 and less than four percent 
at z = 0.99. 

The good match in number counts does not imply that 
the algorithm is perfect. In fact, the typical number of halos 
contributing to an iV ga i > 8 cluster is ~jV ga i/4. The second 
and fifth richest clusters found at z — 0.41 illustrate the 
range of behavior in clean and blended clusters. Figure [5] 
shows projected positions and color-magnitude information 
for sky patches centered on the two clusters. The second 



richest cluster has 212 members contributed by 21 different 
halos. Members of one of the most massive halos at that 
epoch, M = 2.0 x 1Q 15 h^M®, comprise 88% of the cluster 
members. The remaining members come from 20 other ha- 
los, including some lying in the foreground. A small number 
of members are contributed by halos in the background. 

The fifth richest cluster, with 175 members, presents a 
very different case. Its most massive contributing halo has 
a mass M = 4.2 x 10 14 h~ x Mq, which contributes almost 
all of its own galaxies but only 35% of the cluster's mem- 
bers (fih = 0.35). A total of 53 other halos also contribute, 
many lying close (within <30/i _1 Mpc) in the foreground 
or background. 

Although much richer than most of the halos consid- 
ered, these two examples illustrate the essential projection 
problem that is causing the blends; both sets of galaxies ap- 
pear to be reasonable clusters in the x-y plane. In the next 
two sections the statistics of the clean and blended clusters, 
and their features, will be discussed in more detail. 
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Figure 4. Differential completeness of the -/V b s > 8 cluster population with respect to halos as a function of their intrinsic red-sequence 
richness. The circular overdensity defining the cluster population is A p = 7 and panels show results for (left to right) redshifts z = 0.41, 
0.69, 0.99. Here, completeness is the fraction of halos that contribute the plurality of a cluster's red sequence galaxy population. The 
solid line is the fraction associated with all clusters and the shaded region is fraction found in clean (fih > 0.5) clusters. The vertical 
line is the minimum imposed cluster richness imposed (-/V m ; n = 8). Projection effects introduce scatter between intrinsic and apparent 
richness that blurs the sharp observed threshold into a smooth intrinsic selection function. 
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Figure 5. Cumulative number of halos as a function of red sequence richness -/V ga i (bold) compared to the projection-averaged number 
of clusters found with the circular overdensity algorithm with A p = 7 (light) at redshifts z = 0.41, 0.69 and 0.99 (left to right). 



3.2 Mass selection function of clusters 

The mass selection function is an important ingredi- 
ent for cosmologica l test s with optical cl uster surveys 
|White fc Kochanekl (|2002l ) , iRozo et all (|2007r i ) . Fig. |3] (bot- 
tom) shows the relationship between the observed richness of 
a cluster and the mass of its top-ranked halo (see § I2.3|I . Cir- 
cles show clean clusters while small crosses show blends. At 
each redshift, the clean cluster population displays a power 
law mean relation remarkably similar to that of the under- 
yling halo population. The slopes of the relations agree at 



the few percent level; the values for halos (clean clusters) for 
iV ga i > 8 are 1.07 (1.04), 1.10 (1.06), 1.10 (1.15) from low 
to high redshift, respectively. The intercepts at iV ga i = 20 
also agree at the few percent level, and could be further fine- 
tuned by introducing small changes to the search threshold 
A p at each redshift. At all redshifts, the circular overdensity 
algorithm is effective at identifying the mean richness-mass 
behavior of the underlying halo population. 

The dispersion in the observed cluster sample is larger 
than for halos, due to failure modes of the search algo- 
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Figure 6. Examples of blended (left, 5th richest) and clean (right, 2nd richest) clusters found at z=OAl. Filled triangles are members 
of the first-rank matched halo, open circles are other cluster members. Open squares are members of the best fit halo not in the cluster; 
they fall outside the red sequence as can be seen in the lower left hand panel. Triangular stars are members of the red sequence not in 
the found cluster. Comoving scales are shown, note that the axes in the dz vs. dx figure are scaled differently. 




Figure 7. Mass selection functions p(M|JV gal , z) with richness 7V gal = (18, 18, 16) ± 4 at redshifts z = (0.41, 0.69, 0.99) (left to right). 
Solid lines give the intrinsic halo mass distribution in these richness ranges, and are the same in each column. The shaded distribution in 
the upper row gives p(M \N g ai) for clusters, with M the mass of its top-ranked matched halo (§ 2.3). The middle row shows p(M|7V ga i) 
for clean clusters (fih > 0-5) while the bottom row gives the mass distribution of blended clusters (fih < 0.5). The average mass of the 
halos/clean clusters/blended clusters are shown respectively in the top/middle/bottom panels for each redshift. The fraction of clean 
clusters / c i can i s a l so given in the middle row for each redshift. 
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Figure 8. Cumulative fraction of clusters in Fig.[7Jas a function 
of their top-ranked halo overlap fraction, fi^. Top to bottom lines 
are redshifts, z=0.99, 0.69 and 0.41. The fraction of galaxies from 
the top-ranked halo declines with increasing redshift. The vertical 
line identifies the fraction of clusters that are blends, f lh < 0.5. 

rithm. At fixed observed richness, blending creates a tail to 
low masses while fragmentation of large halos into multiple 
clusters introduces a high mass tail. Fig. [7] shows estimates 
of the conditional halo mass distribution, p(M\ JV gal , z), de- 
rived from cross-sections of the joint likelihood data in 
Fig. [3] in richness ranges iV ga i = (18, 18, 16) ± 4 at redshifts 
(0.41,0.69,0.99), respectively. This choice gives a constant 
average halo mass, 1.2 x 10 14 /i _1 Mq, at all three redshifts. 

The cluster likelihoods (shaded in the figure) are com- 
pared with the halo distributions for the same richness 
ranges, shown by solid lines. The top row shows all clusters, 
while the middle and bottom rows separate the samples into 
clean and blended systems, respectively. Raw counts rather 
than normalized likelihoods are shown to give the number 
of objects. 

At 2=0.41, more than 90% of clusters in the chosen rich- 
ness range have their dominant underlying halo contribut- 
ing at least half of the galaxies. The mass distribution of 
the found clusters matches well the underlying halo mass 
likelihood. At higher redshift, the correspondence between 
halos and clusters weakens somewhat; the number of blends 
more than doubles, from < 10% at 2=0.41 to 22% at 2=0.99. 
The blended systems contribute a low mass tail to the halo 
mass likelihood. For the distributions, the central mass of 
the clean clusters remains at 1.2 - 1.3 x 10 14 /i _1 M s at all 
3 redshifts, while the central mass of the blends drops, from 
7.1 x 10 13 /i _1 M Q at z = 0.41 to 5.0 x 10 13 /i _1 M Q . Thus 
the ratio of central masses between the clean and blended 
clusters also increases with redshift. 

Our classification of clean versus blended clusters is 
based on a somewhat arbitrary cutoff of 0.5 in member frac- 
tion. Figure [8] provides a more complete picture by plotting 
the cumulative fraction of clusters that have top-ranked halo 
member fraction < fih- Here the same observed cluster rich- 



ness limits as in Fig. [7] are used. Cutting at fih > 0.5, the 
vertical line, gives the clean fractions quoted in Fig. [7] Ana- 
logues for other definitions of "clean fraction" in terms of fih 
can be read off as well. There is a clear trend with redshift, 
with clusters at 2=0.99 being less well-matched to halos than 
those at 2=0.41. The median value of fxn tells a similar story, 
decreasing from '■--0.8 at 2=0.41 to '--0.7 at 2=0.99. Blending 
is clearly increasing at larger redshift. 

Going to a higher central mass gives similar trends, 
e.g. centering on a richness corresponding to a average 
1.5 x 1O 14 /i _1 M halo mass at all redshifts gives a clean 
fraction of 90% at redshift 0.41 which decreases to 76% at 
redshift 0.99 for the same A p as above (A p can be increased 
for higher richness to improve both numbers but the increase 
of blends at high redshift remains) . 



3.3 Causes and trends for blends 

There are several effects which cause an increasing incidence 
of blends at higher redshift. Firstly, the change of observed 
color with distance is weaker, and secondly, the red sequence 
is wider, so the color-magnitude cut selects galaxies from a 
thicker slice along the line of sight. These seem to be the 
strongest causes and were illustrated in FigO 

Another way of seeing the effect of color/magnitude evo- 
lution is to remove it entirely at z = 0.41; the background 
level then increases and the contrast between the clusters 
and the background declines. Lowering A p to obtain the 
same number of clean clusters at the fixed mass range of 
Fig. [7] we find that the level of blends increases to ~ 20%, 
very close to what is seen at z ~ 0.99. Similarly, to increase 
the clean fraction, one can impose the z = 0.41 color evolu- 
tion on the z = 0.99 population. In this case, however, the 
number of non-red sequence galaxies brought into the red 
sequence through our evolution increases strongly, limiting 
the degree to which blends can be reduced. 

A third contributing factor is that, at earlier times, the 
mass function is steeper, causing the number of possible 
interloper halos per target halo (of mass ~ 10 14 ft^Mo, 
for example) to grow at high redshift. The increase in 
intermediate-mass halos is also enhanced because the central 
galaxy magnitude is less well correlated with host halo mass 
at 2=0.99 than at low redshift. Over time, central galaxies 
in massive halos grow and brighten via mergers, leading to 
a stronger correlation between 2-magnitude and halo mass. 
Our cluster finding algorithm works in descending order of 
luminosity. At low redshift, the luminosity sorting corre- 
sponds well to a sorting in halo mass but, at high redshift, 
more low mass systems are mixed into the range of central 
galaxy magnitude occupied by high mass halos. 

As these factors are fairly generic, as expected, the trend 
toward more blends at 2=0.99 appeared in all the cases we 
considered: changing definition and tightness of the red se- 
quence, changing 7V ga i cuts and changing the spherical over- 
density requirement. For a wide range of density cuts and 
modeling choices the blends have roughly half the mass of 
the clean matches at z = 0.41, and this mass scale declines 
at higher redshift. 
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Table 2. Expected Cluster X-ray Properties. 
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4 IMPLICATIONS 

Since blended clusters are associated with lower mass halos, 
they will be evident in follow-up studies as such. Their mean 
lensing signal, X-ray luminosity and temperature, and ther- 
mal SZ decrement should be low relative to clean systems. 
Spectroscopic signatures of substructure, in the form of mul- 
tiple peaks or other departures from Gaussianity, would also 
be likely in these systems. The imprecise centering of the 
multiple components along the line-of-sight would tend to 
flatten the radial number density profile. 

Table [2] provides estimates of the soft band X-ray lumi- 
nosity from our MS blended and clean clusters with richness 
18 ±4 (now fixed across redshifts), compared to values for 
halos of the same richness. We assume a power-law relatio n 
of the form L oc (M/10 14 ft _1 M ) 1 - 6 (|Stanek et ajjooj ), 
and quote values normalized, arbitrarily, to the luminosity 
of a 10 14 hT 1 Mq halo at each epoch. We also assume scatter 
in the mass-luminosity relation, <7i n M = 0.4, and combine 
this with the dispersion in mass for the chosen richness range 
(FigQ to give the dispersion in lumi nosity, a\ n l ■ Lower val- 
ues h ave been suggested for ctjum (|Reiprich fc Boehringerl 
2002), but the scatter in mass at fixed iVg a i dominates the 
intrinsic L-M scatter anyway. 

The clean clusters have mean X-ray luminosities that 
tend to be slightly higher than the corresponding values for 
halos of the same richness. The blended systems are sub- 
stantially dimmer, by a factor two in the mean at z = 0.41, 
growing to a factor three at z — 0.99. 

Blends should be a generic outcome of red sequence- 
based cluster finding methods, and there are indications 
of this from initial X-ray and dynamical observations of 
the RCS clusters. In Chandra observation s of 13 clusters 
at 0.6 < z < 1.0, iHicks et all (|2005l , l2007h confirm 12 as 
X-ray sources at 3— a significance, suggesting that > 90% 
of the cluster candidates are massiv e structures w i th de ep 
gravitational potential wells (see also lBlindert et al l|2007h ). 
However, their X-ray luminosities were systematically lower 
at a given cluster richness than seen for lower-redshift X- 
ray selected clusters. Most of the clusters lay on a sequence 
only slightly offset from the expected L^-richness relation, 
but several clusters were significantly offset. Optical spec- 
troscopy of one of these clusters (at z = 0.9) showed that 
it consisted of several structures which are dynamically dis- 
crete but whose red seque nces were overlapping in the survey 
data (|Gilbank et al|[2007l )- precisely the sort of blended sys- 
tem ex pected by the study here (see also Ivan Breukelen et all 
(2007)). Evidence for large scatter between X-ray luminosity 
and o p tical richness has bee n seen in e.g. lYee fc Ellingson 
20031) ; iGilbank et all (12004): iLubin, Mulchaev fc Postman 




2004); Hic ks et all i|2005l ); iBarkhouse et all (|2006l ) 



Instead of using only the top-ranked halo mass to deter- 
mine the X-ray signal, we can instead sum the luminosity of 
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Figure 9. "Stacked" profiles of clusters with fi^ > 0.5 divided 
by those with fi^ < 0.5 for the A p = 7.0 case and z = 0.41. 
The case here is representative, the trend of ratio with radius was 
seen in all redshifts and color cuts. Stacking after rescaling by the 
outer radius gave similar results. 



all contributing halos. In this case, all the cluster luminosi- 
ties go up, with the clean subset increasing by roughly 0.3 
and the blended subset increasing by a larger amount. Then 
the ratio of clean to blended mean luminosities changes to 
~ 1.2 at low redshift and to ~ 2.4 at high redshift. The 
luminosity measured by X-ray observation will depend on 
details of the projected spatial arrangement, the noise char- 
acteristics and other details that lie beyond the scope of 
this investigation. It seems reasonable to consider the val- 
ues quoted for the single halo case as a lower bound, and 
the values from summing all halos as an upper bound, on 
what would be observed. 

Another difference between clean and blended systems 
is in their radial cluster profiles. Stacked profiles of the clean 
and blended clusters are used to produce the density profiles, 
<°( r ) ~ w\ — N(r)/(r 2 dr), shown in Fig. [9] The clean clusters 
have a significantly steeper mean density profile than the 
blends. T his result suggests th at a matched angular filter 
approach ( Pos tman et all 1 19961 ) could offer improvements, 
particularly one that includes radial d istance information 
from photometric redshifts or colors ()White fc Kochanekl 
2002) . Observations of col o rs with distanc e to cl uster center 
(e.g. lEllingson et all j200lh;lBlindert et all l|2004l )) and other 
properties (e.g. be Lucia et all (I2004T )) are already in place 
at high redshifts. Going further down the luminosity func- 
tion would provide more galaxies to trace out the profile, 
but at the risk of including more faint background galaxies 
redshifted into the color region. 

The interl opers in both cl e an an d blended clusters, 
as expected by iGladders fc Y cc (2000), lie (slightly) more 
frequently in the background than the foreground. There 
doesn't seem to be a strong trend in the moment of inertia 
for clean versus blended clusters; often the blends are pro- 
jections, rather than objects which are merely unrelaxed. 

We also considered using i — z color at high redshift, 
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rather than r — z, and found a similar blend fraction, even 
though the red sequence turns out to be narrower. This is 
because the evolution of red sequence galaxies (now defined 
with respect to i — z) remains very slow with redshift, thus, 
as with r — z color at high redshift, many of the galaxies do 
not evolve out of the red sequence even when far from the 
cluster center. Similarly, the number of non-red sequence 
galaxies evolving into the selection window remains small 
across the ±250 h" 1 Gpc projected length. 

As mentioned earlier, blends can be immediately re- 
duced by increasing the spherical overdensity criterion A p , 
but only at the cost of losing true halos as well. An increase 
in A p also shifts the mass-richness relation to lower values 
of iVgai compared to the intrinsic case, and decreases the 
number of clusters found at fixed AT ga i. These trends reflect 
the usual tradeoff between purity and completeness for clus- 
ter samples; for more disc ussion see, e.g., the appendix of 
IWhite fc Kocha nck (20o3) and the appendix of this paper. 



5 DISCUSSION 

In the above analysis, we have found properties and trends 
for blends as a function of redshift. Some of these results 
depend on particular details of the Millennium Simulation 
and our method, and some are likely to be general. 

Most of the increase in blends at z ~ 1 comes from the 
slower change of color with increasing redshift. This color 
change was not obtained directly from the Millennium sim- 
ulation but from a simple stellar population synthesis model 
that reproduces observations. We expect this result to be 
general. Our implementation of the color change with red- 
shift is crude but the candidate high redshift interlopers are 
mostly red sequence galaxies, where our approximation is 
best expected to hold. As a result, we do not expect more 
detai led color implementations, su ch as mock light cones 
(e.g. iKitzbichler and White! (|2007h for the MS), to produce 
substantially different local (±100 h' 1 Mpc) projected con- 
tamination. 

The increased width of the red sequence at high red- 
shift is derived from the Millennium Simulation. However, 
at 2^=0.99, the weak color evolution combined with the deep 
"green valley" separating the red and blue populations in the 
MS means that our results are reasonably insensitive to the 
precise width. Most of the interloper galaxies are themselves 
members of the red sequence in their respective projected 
halos. The r — z color shift for ±250 h~ x Mpc projection at 
z = 0.99 is —0.1 and +0.18, so only by compressing the red 
sequence to a width well below these values would one have 
an appreciable effect on the blended fraction. 

The relative numbers of interloper halos at different red- 
shifts is a property of the underlying dark matter power 
spectrum and linear growth rate. For a fixed target mass, 
more interloper halos at higher redshift are expected generi- 
cally. Physically, if we look at the line-of-sight distribution of 
the contaminating material we find that the contaminating 
mass at large distances (> 50/i _1 Mpc) more than doubles 
between redshift 0.41 and 0.99. This enhanced contamina- 
tion from large distances is also true on a cluster-by-cluster 
basis: the fraction of clusters with more than e.g. 30% of 
their material coming from > 40/i -1 Mpc grows significantly 
with redshift. This material is far outside the cluster virial 



radius and not just material which got "caught" by the clus- 
ter finder before it fell in to truly be part of the cluster. Note 
that superclusters of very large size have been seen out at 
these r edshifts, for example s e e recent studies of supercl us- 
ters bv lGal. Lubin fc Squires] (120051 ); iNakata et all <|2005l ). 

There are other possible interlopers as well. For in- 
stance, adding galaxies that lie outside the MS volume will 
only increase the amount of blended contamination. Also, 
at faint magnitudes, the increasing numbers of background 
blue galaxies available to redshift into the red sequence are 
a potential cause for concern; increasing nu mber s of bl ue 
galaxies at high redshift are observed (e.g. lEllisI l|l997h ). 
(It should be noted that there are observational techniques 
to take many of them out which we do not include here). 
We saw only a small fraction of candidate interlopers from 
galaxies outside the red sequence, except at low redshift, 
where the green valley is highly compressed (see Fig(2|. This 
is good, as the observed color and magnitude evolution of 
these galaxies was approximated to be the same as for red 
sequence galaxies; we expect the interloper numbers due to 
these objects is minimized at high redshift because of their 
small contribution in our approximation. In fact, taking out 
the observed color and magnitude evolution of the non-red- 
sequence galaxies entirely (and adjusting A p to get the same 
value of nAp) gives similar clean fractions as in our fiducial 
model. 

To extend our analysis of the MS (tuning the cluster 
finder to cluster color profiles in more detail, for example) 
requires further developments. The MS utilizes sophisticated 
physical models for properties such as star formation histo- 
ries, initial mass function and stellar population synthesis 
models, dust production and radiative transfer models, and 
the sensitivity of all of these to local conditions. The result- 
ing MS catalog ues match obs e rvatio n s of large numbers o f 
properties (e . g. ISpringel et all (120051); iLemson et all l|2006|k 
ICroton et al l|200rj) ; IKitzbichler and White! i|2007l )) at many 
redshifts. A detailed, multi-color c omparison to observed 
galaxy number counts is given in IKitzbichler and White! 
(2007). Some departures from observations are noted there, 
in particular the over-prediction of the abundance of mod- 
erately massive galaxies at high redshifts, notably z > 1. 

For our cluster finding, the only properties used are the 
galaxy locations and their fluxes in two filters. Tuning the 
cluster finder to more specific properties of these fluxes (for 
example, their radial trends within clusters) in the MS will 
require higher fidelity galaxy formation models. Work is in 
progress to improve the model's match to observations. For 
example, known issues in the MS under study include, for 
z = clusters, a faint red satellite excess and an excess 
tail of bright blue objects (but with over all blue satellite 
fractions too low), and no "green valley" f Wcinman rTet all 
l2006al ; |Pe Lucial l200rj ; |Pe Lucia eta3l2007l ). We find a red 
sequence with the wrong sign for the color-magitude slope 
(the brightest galaxies tend to be slightly bluer than the 
fainter, see Fig. [TJ, and similarly the blue fraction increases 
towards brighter luminosity and has the wrong radial evo- 
lution within clusters for our three redshifts. In addition 
the simulation was run wit h the earlier WMA P parameters 
rather than the WMAP3 jSpergel et all 120061 ) current best 
fit cosmology. 

Future improvements in optical cluster finding will re- 
quire simulated catalogues that are in better quantitative 
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agreement with the growing body of deep galaxy surveys. 
To refine and use more sophisticated color finders does 
not necessarily require all the physics employed in the MS, 
which aims to explain a multitude of observational prop- 
erties rather than simply reproduce them. Purely statistical 
mock catalogues can be built on halo model fits tuned empir- 
ically to data in the redshift range of interest. The catalogues 
can focus narrowly on observational properties relevant to 
the search algorithm. In particular, critical to a quantitative 
prediction of the amount of contamination in color-selected 
surveys are accurate colors for galaxies in groups and fila- 
ments in the outskirts of clusters, as red-sequence galaxies 
in these regions are the most likely source of interlopers. 

Such survey-specific catalogues are crucial to under- 
stand selection functions, in part because they allowing 
search algorithms to be tuned to the cluster "color foot- 



profitably used at low redshifts (e.g. Kochanck et all 


2003; 


Eke et all 12004: lYan. White & Coill 20041: lYane et all 


20051; 


Miller ct al 20051: IWeinmann et all l2006bl: iMiller et all 


2005; 


Koester et alH2007T). At hieh redshift. data sets laree enoueh 



to tune such catalogues are just coming into being; combined 
with modeling improvements in recent years the construc- 
tion of such catalogues is now a feasible task. 

However, without such a catalogue in hand, our pri- 
mary effect is still simple to illustrate. This effect is that 
the spatial cut provided by the observed color and magni- 
tude cut widens as redshift increases. Conversely a narrow 
spatial cut reduces the blends strongly. For example, taking 
an exact spatial cut for the MS, boxes 100 h~ Mpc wide at 
all three redshifts, the clean fraction becomes almost 100% 
at low redshift and 95% at high redshift. (Presumably the 
remaining blends are due to the other contributing factors 
mentioned above.) A slice this thick would corresponds to 
a fine redshift selection, Az = 0.06(0.04) at redshift 0.99 
(0.41). This level of accuracy is potentially attainable with 
next-generation photometric redshifts. 



6 CONCLUSIONS 

With the advent of wide field imagers, optical searches have 
become a powerful way to compile large samples of high red- 
shift clusters. Key to these techniques is the use of multi- 
color information to reduce the line-of-sight co ntamination 
that plagued earli e r, single filter, observations l|Abelll 19581 ; 
iDalton et a] 1 19921 ; iLumsden et~all 1 19921 ; IWhite et allll999h . 
Two-filter information provides only limited redshift filter- 
ing, and this paper begins to explore the questions of what 
types of objects are selected by such techniques, and how 
this selection evolves with redshift. 

We use a simple circular overdensity search algorithm 
on local sky projections of the galaxy population of the Mil- 
lennium Simulation, tuned using knowledge of the red se- 
quence present in simulated halos with eight or more galax- 
ies brighter than L»/2 in the 2-band. The free parameter, 
the density contrast A p , is tuned to maximize both purity 
and completeness, and the choice A p = 7 produces a num- 
ber of clusters as a function of galaxy richness that is close 
to the underlying richness function of halos. 

We find that essentially all clusters have some degree of 
projected contamination; a cluster of optical richness -/V ga i 



typically has red sequence members from A ga i/4 halos along 
the line-of-sight. In the large majority of cases, the contam- 
ination is not dominant, and most of a cluster's members 
are associated with a single, massive halo. A minority are 
highly blended cases in which projected contamination is 
dominant, and no single halo contributes a majority of the 
cluster's members. 

We find an increased fraction of blends with redshift. Al- 
though several factors contribute, the most important factor 
appears to be weaker evolution in the observed color of red 
sequence galaxies with increasing redshift. This effectively 
increases the path length searched by the red sequence color 
cut, leading to a larger cross section for accidental, line-of- 
sight projections. In addition, at higher redshift, the number 
of ~ 3 x 10 13 h~ 1 M Q halos relative to a 10 14 ft" 1 M halo is 
larger, and the central galaxy red magnitudes at these mass 
scales are more similar. 

The blends add a low-mass tail to the halo mass selec- 
tion function for clusters of fixed optical richness. For our 
found clusters with optical richness targeting 10 14 h~ x Mq 
halos, we expect that ~ 10% of these systems would be 
underluminous in X-rays by a factor of two at z = 0.41, 
growing to ~ 20% underluminous by a factor closer to three 
at z = 0.99. The scatter in individual X-ray luminosities 
for the complete set of clusters is expected to be large, 
crin i, — 1.2 at high redshift, and there is considerable overlap 
in the distributions of Lx expected for clean and blended 
clusters. It should be noted that, observationally, high red- 
shift low-luminosity systems are also likely have lower signal 
to noise. 

The galaxy number density profiles are slightly shal- 
lower for blends than for clean clusters, and a matched spa- 
tial filter approach may help identify and eliminate the for- 
mer. Since some fraction of halos, those undergoing mergers 
especially, will also be spatially extended, careful study of 
the effect of spatial filtering on halo completeness is needed. 
Alternatively, instead of decreasing the number of blends in 
searches, our findings here suggest modeling the mass like- 
lihood p(M\N SR \, z) as a bimodal log-normal distribution, 
with the fraction of blends, and the location and width of 
that component, included as nuisance parameters. This ex- 
pected bimodal distribution can be incorporated into error 
estimates for cluster number counts as a function of red- 
shift, for instance, along with other expected errors (such as 
the 5-10 % scatter associated with red sequence associated 
redshifts iGilbank et aill2007l ). 

Understanding the detailed color/magnitude trends 
within galaxy clusters is key to refining red sequence clus- 
ter finding and improving its success rate. Fortunately, data 
sets in hand or on the way, combined with rapidly improv- 
ing modeling methods, will lead to improvements in our 
understanding of high redshift colors and their evolution. 
This work will be driven largely by s urvey-specific mocks — 
current examples are the 2MASS (iKochanek et all 120031 ) . 
the DEEP2 suryevlTYan. White fc Coill 120041 ), t h"e~2dFGRS 
llEke et aJll2004l; I Yang et all|2005l) and the SPSS iMiller et al 
l2005l ; iKoester et all 1200ft Iweinmann et all l2006bl ) — and 
such efforts will be necessary for mining the rich science pro- 
vided by existing and future high redshift cluster surveys. 
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APPENDIX 

Purity and completeness are "success rates" used when one 
wants a catalogue of a certain type of object and has ob- 
tained, via some method, a catalogue of candidates. A clas- 
sic definition starts with the number of objects which are 
both in the candidate set and in the desired target set, 
i.e. the intersection of these sets. Dividing the number in 
the intersection by the total number of target objects then 
gives completeness, and dividing by the total number of 
candidate objects gives purity (or reliability). These defi- 
nitions go back many years in r adio a stronomy. For instance 
ICondon. Balonek fe Jauncev I (|l975T l used these definitions 
to describe how well optical sources were matched to radio 
sources as a function of search aperture radius. In our case, 
target objects are halos, defined in terms of true richness or 
mass, and candidate objects are clusters, defined in terms of 
observed richness. 

While these terms have a long history, it is not clear 
that such definitions are the 'single number' one wants to 
characterize the success of a cluster finding algorithm. If the 
properties of the sample change slowly with e.g. richness or 
mass for example, we may not wish to impose a hard thresh- 
old on richness when computing purity. Finding a cluster 
with 19 members may be just as good as requiring 20. We 
shall consider several generalizations of the classic notions 
of purity and completeness below. 

We note that there are several choices in all of these 
definitions: the two catalogues, including their underlying 
data samples and the search algorithms employed, and the 
definition of which clusters lie in the intersection. Even fo- 
cussing on the circular overdensity method, as we do here, 
there is considerable latitude in defining both the target cat- 
alogue (e.g., specific definition of a halo, use of halo richness 
or mass as the order parameter) and the cluster candidate 
sample (minimum observed cluster richness, choice of A p ). 
We consider here how purity and completeness vary with 
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Figure 10. Purity and completeness as a function of overdensity 
contrast A p in Eq.[T]for z = 0.41, 0.69, 0.99, top to bottom. The 
target catalogue is halos with 24 < iVtrue < 100 and the candidate 
clusters have 24 < N ^, s . The number of objects in the overlap of 
the two catalogues is divided by the total number of objects in 
the target halo catalogue to get completeness (rising lines) and 
by the total number of objects in the candidate cluster catalogue 
to get purity (falling lines). The overlap of the two catalogues is 
either taken to be either target halos which contribute the most 
galaxies to a candidate cluster (solid line) or only target halos 
which contribute at least half of their cluster galaxies (dashed 
line), i.e. only halos that match to clean clusters. 



Figure [TTJ shows the classic definitions of purity and 
completeness applied to our catalogues for target halos with 
true richness 24 < iVtrue < 100 and candidate clusters with 
observed richness > 24. We define a cluster and its halo to 
be in the overlap of the two catalogues if the halo in the 
target set contributes the most galaxies to a cluster in the 
candidate set. A more restrictive definition is to require the 
halo to contribute more than half of the galaxies in a given 
cluster (fih > 0.5). In both cases, one divides the overlap 
number by the total number of target halos (completeness) 
and total number of candidate clusters (purity). In Fig. 1101 
these two cases (all fih and fih > 0.5) for the overlap set 
are shown as smooth and dashed lines, respectively. 

We show Fig. [TO] as a function of decreasing overden- 
sity threshold, Ap, because this mimics a search region of 
increasing radial scale. At high A p , purity is maximized be- 
cause the cluster sample is selecting the dense cores of the 
most massive halos. As the threshold is decreased, the en- 
larged search area and lower intrinsic density within the halo 
increases the frequency of best matched halos which are be- 
low the target richness threshold, lowering the cluster sample 
purity. In contrast, the completeness grows with lower A p , 
as the increasing search scale matches and then exceeds the 
radial scale used to define the halo population. The number 
of observed clusters at fixed richness increases rapidly with 
decreasing A p , improving the odds of completely matching 
to the massive halo sample. The purity and completeness 
curves cross at roughly our chosen threshold, A p ~ 7. 
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Figure 11. Scatter between true halo richness (x-axis) and ob- 
served cluster richness (y-axis), for blends (top) and clean clusters 
(bottom), for z = 0.41, 0.69, 0.99 (left to right) summed over all 
three projection axes. Even for the clean clusters the scatter is 
extremely large. Contours differ by 10, starting at 2, and the pixel 
size is 2.5. The straight line is AT-true = N b B . 

These measures of purity and completeness are substan- 
tially lower than unity because, as shown in Fig llll there 
is substantial scatter between AT-true and N a b s for matched 
halo-cluster pairs. This should not be taken as a failure of 
the algorithm, because the source of impurity and incom- 
pleteness - the scatter - is largely understood. There are a 
number of ways to take this scatter into account, and the 
optimal method depends strongly on the intended use of 
the catalogue. Different target/candidate sets will vary in 
purity and completeness, driven by the form of the scatter. 
To characterize this one could, for example, use iVtrus > N, 
Nobs > N — 8 to calculate completeness, and Nt IU c > N — 5, 
N b s > N to calculate purity. If one approximates the scat- 
ter as Poisson an obvious choice for N = 25 would be 
8 = y~N — 5. For N — 24 using 8 — 5 raises the frac- 
tions for completeness and purity by ~ 0.1 — 0.2 (the largest 
change is at A p for low completeness or purity, the smallest 
at high completeness or purity), while using 8 — 10 roughly 
doubles the effect. The purity and completeness curves still 
cross around A p ~ 7 (slightly lower for z = 0.41) but at a 
higher fraction (for z = 0.41 it goes from ~ 0.7 to ~ 0.85 to 
~ 0.95 for shifts by 5 and 10, respectively). Note that the 
a priori arbitrary choice of 8 should be motivated by some 
understanding of how the sample properties change with the 
property being used to define the sample. 

Extending the candidate sample when defining com- 
pleteness and the target sample when defining purity can 
be taken further. We can take the target halo JVtme above 
some cut and consider all clusters above the minimum rich- 
ness threshold (which could be as low as one) to define com- 
pleteness, for example. In our case the minimum richness 
threshold is iV b s > 8. The differential form of completeness 
for iVcbs > 8 and A p = 7 is shown in Fig.[4]in the main text. 
For purity one can again reverse the limits. If one goes to 
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Figure 12. Purity and completeness as a function of overden- 
sity contrast A p for z = 0.41, 0.69, 0.99, top to bottom. Unlike 
Fig. 1101 the target and candidate samples used to define purity 
and completeness differ. For completeness (increasing line), the 
target catalogue is halos with 20 < Ntmc < 100 and the can- 
didate clusters are any found cluster (i.e. 8 < N b s ), both clean 
and blended. For purity (decreasing line), the candidate catalogue 
comprises all clusters with N a b s > 20, the target set is halos with 
8 < iVtruc < 100, and a halo is only taken to be in the intersection 
if it contributes the majority of galaxies to a cluster, i.e. this pu- 
rity is simply the fraction of clean clusters, / c i Gan . Completeness 
decreases and purity increases with increasing A p . 



A'truc > 1, all clusters will get matched to at least one halo, 
and values close to this will have similar results. A possi- 
bly more useful definition of purity could be that the best 
matched halo contributes at least 0.5 of its partner cluster's 
members. This definition of purity corresponds to the clean 
fraction shown in Fig. [7] 

Fig. [l2]shows the purity and completeness for these less 
restrictive sample definitions, taking iV = 20 and 8 = 12. 
The solid increasing line and the dashed decreasing line 
(with decreasing A p ) are directly analogous to their coun- 
terparts in Fig. [10] At high surface density A p , the 7V bs > 8 
cluster sample is incomplete with respect to 100 > iVtrue > 
20 halos because the cores of some N tIUC > 20 halos fall be- 
low the cluster richness limit of 8 members. As mentioned 
in the text, high redshift high mass halos are more likely 
to be disturbed than their lower redshift counterparts and 
thus to fall below the overdensity threshold. The purity of 
the overall cluster sample purity is very high at z — 0.4, 
but declines at higher redshift where halo blending is more 
severe. 

As the threshold A p is lowered, the fraction of halo 
galaxies lying above the projected threshold increases, but 
the potential for confusion by projection also increases. For 
A p = 4, the A^obs > 8 cluster sample is essentially 100% 
complete for N t] - uc > 20 halos at all redshifts. The overall 
purity of the iV bs > 20 cluster sample is substantially lower, 
dropping to values below 0.5 at z = 1. 

Another way of choosing the two samples is pursued by 
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IRozo et~al (|2007T l. who take Fig. 1111 combining both blends 
and clean clusters, and identify strong outliers (see their 
Fig. 2). They take a slice in a fixed Nobs range to define 
purity (number which are not outliers over total in slice) and 
a slice in a fixed ATtrue range to define completeness (number 
which are not outliers over total in slice). The issue of how 
outliers are defined and which slices in N h s , Ntme are taken 
will affect the detailed results, and, again, the intended use of 
the catalogue needs to be taken into account before deciding 
the optimal choices. 

The difficulty in finding the best definition lies in try- 
ing to get two numbers (purity and completeness) to char- 
acterize an entire joint distribution. The full distribution of 
candidate properties as a function of target halo properties 
is the key information required to compute expectations for 
an observational catalogue. The scatter is not a problem if 
its shape is sufficiently well understood and the required ac- 
curacy for understanding this distribution depends upon the 
specific use of the catalogue. For instance, high purity (but 
not necessarily high completeness) might be of interest if 
one is interested in high mass clusters for individual X-ray 
followup, while if one wants a sample of clusters for cosmo- 
logical parameters, a scatter in mass can be included in the 
analysis, but high completeness is desirable to beat down 
statistics. If one can correct for scatter perfectly, obtaining 
high purity is then not crucial; purity serves only to quan- 
tify the size of the correction being applied to the data or 
the model. The errors that are most important to avoid, and 
how well characterized our selection function is, determines 
the best line of attack in the tradeoff between purity and 
completeness. 
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